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We introduce an extension to the Weighted Ensemble (WE) path sampling method to restrict sampling to 
a one dimensional path through a high dimensional phase space. Our method, which is based on the finite- 
temperature string method, permits efficient sampling of both equilibrium and non-equilibrium systems. 
Sampling obtained from the WE method guides the adaptive refinement of a Voronoi tessellation of order 
parameter space, whose generating points, upon convergence, coincide with the principle reaction pathway. 
We demonstrate the application of this method to several simple, two-dimensional models of driven Brownian 
motion and to the conformational change of the nitrogen regulatory protein C receiver domain using an elastic 
network model. The simplicity of the two-dimensional models allows us to directly compare the efhciency 
of the WE method to conventional brute force simulations and other path sampling algorithms, while the 
example of protein conformational change demonstrates how the method can be used to efficiently study 
transitions in the space of many collective variables. 



I. INTRODUCTION 

Molecular simulations can provide deep insight into 
the mechanisms of physical processes, in part, because 
they inherently possess a spatial and temporal resolution 
that is unmatched by most experimental techniques. Un- 
fortunately, many physical and biological processes such 
as chemical reactions, nucleation, protein conformational 
changes, and ligand binding occur on timescales that are 
inaccessible to conventional brute-force simulation meth- 
ods. Due to this shortcoming, there has been broad in- 
terest in developing methods that augment conventional 
simulations to allow them to capture rare events in a 
reasonable time fram^^iU^ several of which are reviewed 
in Ref. [THHl Almost all of these methods rely on en- 
hancing sampling in a reduced set of collective coordi- 
nates that span the important regions of the high dimen- 
sional phase space. The computational effort is thereby 
focused on sampling transition regions, which would oth- 
erwise be visited infrequently, if at all, in conventional 
simulations. The success of rare event sampling often 
hinges on the particular progress coordinate or order pa- 
rameter used to discriminate movement along the tran- 
sition, and choosing the proper progress coordinate is a 
non-trivial task. In fact, it is likely that most processes 
can only be described by multiple progress coordinates, 
which further complicates identification of the appropri- 
ate pathway through phase space. The need to incorpo- 
rate additional progress coordinates also drastically in- 
creases the computational demand since the cost scales 
like the power of the number of progress coordinates. 

Even if a transition occurs via a convoluted set of steps 
requiring multiple progress coordinates to describe the 
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process, it is generally thought that the reaction can be 
meaningfully represented by a chain of connected nodes 
or beads (referred to as images) that define a patlP^Jti^l 
Methods built around this idea provide an elegant so- 
lution to the increased phase space needed to explore 
multiple directions because they are able to track an ar- 
bitrary number of progress coordinates while restraining 
the sampling to effectively one dimension. One such real- 
ization of this approach is the string methocP^. Since its 
advent, a number of variants of the string method have 
been developed to ev aluate the transition pathways of 
complex system^SEH Additionally, the basic framework 
of the string method has been integrated with a variety 
of other sampling procedureJ^^'^. While string meth- 
ods have been applied to a diverse set of problemPHHUl^ 
reactions that occur via many, conformationally distinct 
pathway ^23111 jjiay be poorly suited for string methods. 
Studies have shown, however, as we do here, that when 
the transition is confined to a few reaction channels, 
which can be explicitly accounted for, string methods 
still perform well. 

Here, we merge a rare event sampling method known 
as the Weighted Ensemble (WE) methocP^ with a string 
method, and we show that this combined approach per- 
forms both accurately and efficiently. WE sampling is 
a rigorous method for sampling both equilibrium and 
non-equilibrium systems even in the presence of long- 
lived intermediate state^^MSI]^ From a practical perspec- 
tive, WE sampling is easy to implement and straightfor- 
ward to parallelize across large computational resources. 
The WE method has been previously used with Brow- 
nian dynamics simulations to study protein bindin^^HM] 
and protein foldin^^H^ ^nd it has been used with explicit 
solvent molecular dynamics to investigate molecular as- 
sociation event^^. Our group, and others, have also 
extended the method to examine large-scale conforma- 
tional transiti ons in b iomolecular systems using coarse- 
grained model^^MIMl^ Augmenting WE sampling with 
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a string method yields not only the transition pathway 
represented by the converged string, but also a dynami- 
cally exact representation of the path ensemble along the 
string, from which steady-state distributions and kinetic 
information can be extracted from a single simulation. 

We illustrate the WE-based string method by apply- 
ing it to several example systems of varying degrees of 
complexity. First, we examine two different nonequi- 
librium steady-state processes. The first of these is a 
Brownian particle in a unidirectional flow on a periodic 
two-dimensional surface^^. Second, we examine a driven 
Brownian particle on a two-dimensional potential sur- 
face with intermediate metastable states along distinct 
forward and backward pathwayi^Sl. Finally, we apply the 
method to the equilibrium conformational transition of 
the nitrogen regulatory protein C rece iver domain using 
a two-state elastic network modeP^'i^. All three systems 
have been previously studied with other string methods, 
facilitating a direct comparison with the WE-based algo- 
rithm. Moreover, the low dimensionality of the first two 
examples makes it possible to rigorously sample each sys- 
tem with conventional simulation to test the accuracy of 
our method and serve as an additional benchmark for 
comparing specif ically a gainst nonequilibrium umbrella 
sampling (NEUS^PaMl. Both WE and NEUS methods 
are able to calculate steady-state rates and distributions, 
even in the presence of long-lived intermediates, and the 
clarity of the framework developed in Ref s . [T^ and f5FI for 
comparing NEUS to conventional sampling has enabled 
us to include WE in this comparison. In all cases, the 
WE-based string method obtains high-fidelity estimates 
of steady-state distributions and rates using comparable, 
or less, computational resources than other string based 
methods. Thus, the procedure that we present here lays 
the groundwork for applying WE sampling to a broad 
range of problems in which many progress coordinates 
are required to efficiently partition phase space along a 
transition pathway. 



replicas of equal weight with identical coordinates, but 
randomly assigned velocities. Each replica is then sim- 
ulated independently for a short time interval r during 
which it is allowed to explore conformational space fol- 
lowing its natural dynamics. At the end of this interval, 
every replica is assigned to a bin based upon its instan- 
taneous coordinates at the end of the time interval. 

While assigning the configurations of the system into 
bins could be done using the full configurational space 
of the system, in practice the progress coordinates repre- 
sent a set of collective coordinates in a reduced sub-space. 
Let X specify the configuration of the full system, and let 
6{x) be the mapping of x into the progress coordinate 
space. The progress coordinates must be chosen to dis- 
criminate the product and reactant states, but need not 
specify a true reaction coordinate. 

Within each bin, the number of replicas is held con- 
stant, or nearly constant, by occasionally replicating or 
terminating the replicas in that bin. If an occupied bin 
contains fewer than the target number of replicas, one 
or more of the replicas are selected via a statistical pro- 
cedure and are split into M new copies that each carry 
a fraction 1/M of the probability of the parent. Con- 
versely, if a bin contains more than the target number of 
replicas, a culling procedure removes excess replicas. The 
weight of the culled replicas is redistributed to a subset 
of the remaining copies in that bin. These changes to 
the number of replicas within a bin are carried out in 
a way that does not bias the underlying dynamic d^^ l ^° l 
Metastable regions that would typically accumulate large 
numbers of replicas and consume a large fraction of the 
computational effort required to simulate the system are 
not over-populated because of the culling process. Con- 
versely, the conformational space atop a barrier between 
states that would normally be poorly sampled using con- 
ventional simulations is enriched with many low-weight 
replicas. Details of the replication and termination pro- 
tocol can be found elsewhere^. 



II. METHODS 

A. Weighted Ensemble path sampling 

Weighted Ensemble sampling is a ge neral and rig- 
orous method for simulating rare event^^filzg]^ This is 
accomplished via a resampling protocol that partitions 
the progress coordinate space of the system into non- 
overlapping bins that can be defined in an arbitrary num- 
ber of dimension^sS!^ Here, resampling refers to a scheme 
that generates an alternative, but equivalent, statistical 
sample of a system's phase space, which in the context 
of WE sampling is accomplished as follows. 

Multiple replicas of the system are initiated from some 
initial distribution of conformations. Each replica is as- 
signed a weight, such that the sum of the weights of all of 
the replicas in the system is unity. In the simplest case 
this initial distribution of replicas may consist of Nrep 



B. String Method 

As in previous WE simulations, phase space is divided 
into non-overlapping regions or bins defined by a set of 
progress coordinates. However, instead of a partition 
with the number of dimensions of the progress coordinate 
space, the bins are constructed as a string of Voronoi cells 
in a single dimension (the arclength along the string), 
embedded in the higher dimensionality progress coordi- 
nate space. The string then evolves in time based on the 
dynamics of the replicas to follow the principle pathway 
through the conformational space. Our implementation 
of a string method, adapted for WE sampling, closely fol- 
lows the algorithm of the finite-temperature string (FTS) 
method suggested by Vanden-Eijnden and VenturolP^. 

First, an initial path is constructed between two re- 
gions of phase space using a set of progress coordinates 
that are assumed to be sufRcient to describe the transi- 
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tion between regions. A discretized set of evenly spaced 
images along the path, yja, partition the progress coor- 
dinate space into bins, Sq,. The subscript a = 1, iVim, 
where A^im is the total number of images along the string. 
The string can be thought of as a spline with the images 
being the nodes connecting the individual segments. In 
practice, we use a linear interpolation to initialize the 
string with evenly spaced images. Each replica specified 
by X is mapped into progress coordinate space via the 
transformation 9{x) and then assigned to the closest bin 
Ba by calculating the distance to all images (fa- The 
image ipa is therefore the generator of the Voronoi cell 
corresponding to the bin Ba. Distance measurements in 
progress coordinate space require the use of the appropri- 
ate metricPSJ however, once the metric has been defined, 
assigning a replica to a bin is straightforward, regardless 
of the dimensionality of that space. A detailed discussion 
of how to estimate the metric for a set of collective co- 
ordinates is given in Ref. 1181 The boundaries separating 
bins, which can be quite complicated in high-dimensional 
spaces, need not be computed. 

The string images are then adaptively updated accord- 
ing to the following algorithm: 

1. Simulate the system to explore phase space for a 
time Tmovej which is generally an integral number of 
r. During this period, replicas are free to move be- 
tween bins, unlike in the finite-temperature s tring 
method^^ or nonequilibrium umbrella sampling^l^. 

2. Compute the average position of all replicas in each 
bin over the time Tavg according to: 

where i indicates the j*'* of Nr replicas simulated 
in this interval, Wi is the weight of replica i and 
ha{9{xi)) is an indicator function for Voronoi cell 
a, which assumes a value of 1 if replica i is in bin 
a, and otherwise. 

3. Update the images along the strings, moving them 
toward the average positions within each bin ac- 
cording to: 



This final step prevents the images from drifting to- 
ward nearby metastable states, which would com- 
promise the efficient sampling of the transition re- 
gions. Relabel the shifted as f^'^^- 

Steps 1-4 are iterated until the images defining the 
string are stationary. 

While this rough outline defines the method, there are 
additional details employed in the current study. For in- 
stance, system configurations, Xi, used to identify bin 
centers in Eq. [l]are taken as the coordinates at the end 
of each WE simulation of length r. While this choice 
of averaging works well for the systems presented here, 
more or less frequent configuration snapshots should be 
averaged depending on the length of r compared to the 
natural decorrelation time of the system within a bin. 
The key is to ensure statistical independence of configura- 
tions, while not averaging too infrequently. Additionally, 
using only the last Tavg/ iterations in calculating the av- 
erage position of replicas within a bin limits the effect of 
early sampling near the initial position of the string. This 
is desirable since the weight of replicas within each bin 
can change rapidly early in the simulation before reaching 
steady state, thus biasing the calculation of the average. 

Smoothing the string as it evolves is essential to 
prevent kinks and other pathological shapes from 
developing^"^ . We accomplish smoothing through manip- 
ulating the term r* in Eq. [2] using one of two different 
methods. First, for the protein conformational change 
example (Sec. IIIC), we define 
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for a = 1. . .iV - 1, k" = KNi.mC and r* = = 0. This 
term adds an effective elasticity to the string that pre- 
vents the image from moving to {6a) if it causes large 
kink angles between the current image and its neighbors. 
The parameter k, which is positive definite, controls how 
aggressively the string is smoothed^*. Alternatively the 
updated positions can be smoothed using a multidimen- 
sional curve fitting procedur^^D^ This procedure was used 
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for the examples in both Sec. Ill A and IIIB First, (p* 
are calculated with r* = for all a. Then a smooth 
continuous path connecting y>g and ip*j^ is generated by 
fitting 



where (p* are the updated images, are the pre- 
vious position of the images, ( > controls how 
rapidly the image moves toward the current esti- 
mate of the average position in the bin, and r* is 
a smoothing term discussed below. 

4. Redistribute images uniformly along the arc length 
of the string by fitting a piece-wise linear spline 
through all yj* and then spacing images equidis- 
tantly along the curve. This reparameterization of 
the string ensures for a given update iteration, n, 



A^dim P 

i=l j=l 



(5) 



to by varying A over the range [0,1]. Here, -/Vdim 
is the dimensionality of the progress coordinate space. 



Bi is the unit vector of the i*'' progress coordinate and 
CTi J- are the coefficients of P sinusoidal basis functions in 
each dimension. The superscript cur indicates that if'^'^^^ 
on the left hand side is calculated by the curve fitting 
(3) procedure. The parameters ai,j and Aq are selected to 
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minimize 



(6) 



0=0 



Details of the optimization procedure to minimize are 
given in the supplementary materials of Ref. 1361 



C. Re-weighting Scheme 

In the original formulation of the WE method, the 
initial distribution of weights gradually redistributes 
throughout the sampled conformational space and 
reaches either an equilibrium or non-equilibrium steady 
state after some finite number of iterations^. In the pres- 
ence of metastable intermediates, the time for the system 
to relax to the correct steady-state distribution of weights 
can be very long, which hampers the efficiency of the 
methocF^. It was shown previously that this relaxation 
time can be dramatically reduced through a re-weighting 
procedure that used estimates of the rate of transitions 
between bins to solve for the steady-state distribution^. 
Briefly, since the dynamics of the individual replicas in 
a WE simulation arc unbiased, the transition rates be- 
tween the bins can be used to estimate the elements of 
a transition matrix k. A complete specification of the 
transition rates allows one to solve for the steady-state 
probabilities using 



dt 



(7) 



where Pi is the probability associated with bin z, and /c^j 
is the rate of transitions from bin i to j and = 1. 

The new estimate of Pi is then used to re-weight the 
replicas in bin i, where the weight is distributed among 
the replicas in the bin in proportion to their weights be- 
fore re-weighting. For the example in Sec. [Ill C[ which 
involves equilibrium sampling, we use a different re- 
weighting procedure that uses detailed balance to con- 
strain the estimate of the steady state distributiorP^. 

Unlike previous WE studies^^ '^^ , which employed a sin- 
gle re-weighting step based on a short period of initial 
sampling, we carry out multiple re- weighting steps at pe- 
riodic intervals T^w as suggested in Refs. |3SJ The 
rates are calculated over a window spanning Tj-avg simu- 
lation steps. In practice we use a fractional window that 
extends from the current simulation time back Travg/i" it- 
erations of the WE resampling procedure, which discards 
inaccurate contributions to the rates early in the simula- 
tion. That fraction is fixed during the re- weighting phase 
to Ti-aYgZ-^rT, where Nt is the current iteration number. 

When the steady state re-weighting method is used in 
a simulation, we separate our simulation protocol into 
distinct phases. We apply the re- weighting protcol only 
during the initial phase, early in the simulation, since it 
efficiently relaxes the system away from the initial inac- 
curate distribution of weights toward the correct steady 



state distribution. We find, however, that the standard 
WE method obtains a higher overall level of accuracy as 
steady state is approached. The reason for this increased 
accuracy is related to statistical net zero flux between 
bins emerging naturally at steady state with standard 
WE, rather than being enforced based on potentially in- 
accurate rates determined by the re-weighting scheme. 
Similar observations were made by Dickson, et al. for 
the NEUS method, and they also only use a global re- 
weighting scheme early in the simulationJ^. 



D. Calculating rates 

It is of interest to use simulation to calculate the rates 
for a system to interconvert between distinct states A 
and B. This can be achieved from very long conventional 
simulations by observing many transitions between A and 
B and separating the A B transitions from the B — >■ 
A transitions. Explicitly, at some time t, a trajectory 
that had last been in state A is assigned to state A {Sa) 
at time t; if it had last been in state B, it is labeled 
as belonging to state B (Sb)- Given this partition, the 
reaction rate from state A ^ B or from _B — >■ A is given 



kA.B = lim 

T-i-oc 



-,kB,A = hm — — , 

T— i-oo 



Ta 



'-B 



(8) 



where Ta and Tb are the total time that a trajectory 
has been assigned to state A or B, respectively, during 
the interval [0,T]. During the same interval, N'^ g is the 
number of times a trajectory assigned to A switched to 
being assigned to B, and Ng enumerates the reverse 
transition. 

An alternative but equivalent definition of the reaction 
rate based on fluxes into A and B is given bjEl 



B\Sa 
Ha 



,k 



B,A 



A\Sb 



(9) 



where ^b\Sa (^aiSb) the flux into state B (A) from 
trajectories that originated in state A [B), and Ha (fiB) 
is a history dependent indicator function that is equal to 
1 if the trajectory was more recently in state A [B) than 
in B {A), and zero otherwise. The indicator function 
serves to assign trajectories to either state A or i? as 
above. Overbars indicate a time averaged quantity. 

The original formulation of the Weighted Ensemble 
method, in which replicas originating from a reactant 
state (state A) were reintroduced into that state upon 
crossing the product surface (state S), effectively isolated 
members of the trajectory assigned to a single state^® ^S, 
In principle, WE does not require one to run simulations 
with this feedback scheme, since it properly accounts for 
a dynamical trajectory's historji23. Thus, a state can be 
unambiguously assigned (assuming that all trajectories 
are initiated from A or B - otherwise there will be some 
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transient period when a trajectory is unassigned) to ev- 
ery rephca enabhng one to calculate the requisite fluxes 
in Eq. |9] 

Dickson and colleagues introduced a dual-direction 
scheme that uses an extended bin space to split the for- 
ward and backward path ensemblea^^. In the standard 
nomenclature of a WE simulation with iVdim progress 
coordinates, one adds an additional progress coordinate 
which labels each replica as either being assigned to the 
product or reactant state. For the Voronoi-based divi- 
sion of the progress coordinate space used in this study, 
this additional progress coordinate modifies the distance 
metric used to assign a set of coordinates to a bin: 



if X,n+1 Vl.m+l 



Yl{^i~'^aif otherwise. ^^'^^ 

i=l 



A replica who had last visited A is then infinitely far from 
the centers of the string associated with state B. During 
a WE simulation then, ^b\Sa calculated as the flux of 
probability from all Voronoi cells in Sa into the region 
defining state B. 



E. Implementation 

We have implemented all of the systems in Sec. |III| 
using open-source software written in Python. The 
string method was implemented as a plugin for the 
Weighted Ensemble Simulation Toolkit with Paralleliza- 
tion and Analysis (WESTFAJPSl^ which provides a gen- 
eral framework for performing and analyzing WE sim- 
ulations. The complete set of tools required to simu- 
late the example systems, analyze the results and gen- 
erate the figures found in this study are available at 
https : / /simtk. org/home/westring[ 



flux across the periodic boundary. A particle on the po- 
tential evolves according to the over damped equation of 
motion in the presence of a random stochastic fluctua- 
tion. The discretized form of the equation is 



Xit + dt)=X{t)-—(VxV 



where X — {x,y), St is the time step, SX'~^ is a random 
displacement with zero mean and variance 2Ddt. The 
diffusion coefficient, D = (rn/3^)~^ is defined in terms of 
the mass of the particle, to, the inverse temperature, /3 
and the friction coefficient, ^. 

In this work, St = 0.002, ^ = 1.5, F = 1.8, P = 4.0, 
TO = 1.0 and 7 = 2.25. Parameters related to the string 
parameterization and WE sampling are given in Table [T] 
The value of a modulates the height of the barrier span- 
ning the periodic boundary. Using two different values 
of a, we consider a case where transitions are common 
(a = 1.125) and another where the barrier is higher and 
transitions are rare (a = 2.25). These parameters were 
selected to match the values used in Ref. [T9l a lthough 
they differ from the values originally reportecP^. 

Figure [T] shows the converged string for the common 
transition case after lOOOr. The string was initialized as 
a vertical line between (0,0.05) and (0,0.95). Updates 
to the string were performed using the multidimensional 
curve fitting procedure (Eq. [5| with P = 2. For both 
conventional and WE simulations, we generate a projec- 
tion of the steady-state probability distribution onto the 
y axis as the simulations progress in time. These projec- 
tions are then used to analyze the convergence properties 
of each simulation method compared to a well-converged 
target distribution. The discrepancy between the target 
distribution and the simulated distribution accumulated 
to a particular time point on a logarithmic scale is 
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error 



(13) 



III. EXAMPLES 

A. Periodic two-dimension system 

As an initial test of the string-based Weighted En- 
semble sampling method, we apply it to a periodic two- 
dimensional system^^ with a potential surface defined by 



where 



x - i sin(27r2/) 



acos(27ry), (11) 



where x and y are the spatial coordinates and a and 7 
are parameters that determine the shape of the potential. 
The form of the potential creates a reaction pathway that 
depends nontrivially on both coordinates, x and y. 

Periodic boundary conditions are applied in the y di- 
rection such that y G [0,1). A constant external force 
{Fext = Fy) is applied along the y direction, which drives 
the system out-of-equilibrium and generates a constant 



E,. 



log P{{) 
log 1/T 



^logPt(^) 
log Pt{i) 



P(i) ^ 
P{i) = 



(14) 



Here, P{i) is the normalized steady-state probability dis- 
tribution projected onto the y axis, Pt{i) is the target 
distribution and T is the total time of the complete sim- 
ulation. The histograms used to construct the distribu- 
tions are obtained by dividing the y axis between and 
1 into n — 100 windows of equal width. The alternative 
definition of the error for P{i) = is necessary to ensure 
that the error is still finite when bins have yet to accu- 
mulate any samples in them. The impact of this choice is 
only significant for the conventional sampling case when 
a = 2.25 since these simulations require a non-negligible 
amount of time to reach the windows at the top of the 
higher barrier. In both sets of WE simulations, all of the 
histogram windows are populated almost immediately. 
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even if the initial estimate of the probabihty is inaccu- 
rate. 

The final distributions show excellent agreement be- 
tween the WE simulations and the long conventional tar- 
get simulations for both values of a (Fig. [2]) . At interme- 
diate times, Fig. [3] shows that for both choices of a, the 
WE simulations converge to the correct distribution more 
rapidly than the respective conventional simulations. For 
a = 1.125, the conventional simulations take approxi- 
mately five times longer to converge to the same level 
of accuracy as the WE simulations. For the case where 
a — 2.25, the barrier is high enough that the conventional 
simulations require a significant number of steps to accu- 
mulate sampling in every histogram window. During this 
phase in which a particle in the conventional simulations 
has not visited every window, the error is dominated by 
the term in Eq. 14 corresponding to P{i) = 0, decreasing 
slowly until all of the bins have been visited. The WE 
simulations, conversely, populate every bin almost imme- 
diately and the total error decreases much more rapidly. 
As such, the methods display quite different convergence 
characteristics for the high barrier case, with the WE 
simulations converge to a specific error in at least an 
order-of-magnitude fewer steps than conventional simu- 
lations, and significantly faster for certain error values. 



B. Two-dimensional system witli two pathways 

As a second example, we consider a two-dimensional 
ring-shaped potential with two distinct pathwayJ^Sl con- 
necting a pair of metastable states. A transition from one 
state to the other requires passing through a metastable 
intermediate, which makes this system more difficult to 



sample than the model examined in HI A Many complex 
systems contain metastable intermediates along the tran- 
sition path, so such a test becomes important in ensuring 
the procedure's broader applicability. The potential sur- 
face for this model is defined as 

V{r, 9) ^ a{r - + xi cos(26') ~ X2 cos(46l). (15) 

Here, r = (x^+y'^y/^ and 6 is the angle in radians mea- 
sured counterclockwise from the x axis. Particles on this 
surface evolve according to Eq. [12] with the same defini- 
tion of the noise term and diffusion constant. A constant 
external force, Fext = —F6/r, drives the system out of 
equilibrium in a clockwise directiorP^l. The parameters 
governing the dynamics and shape of the potential sur- 
face were selected to match those used in Ref . 35 , where 
a = 3.0, 7 = 3.0, XI = 2.25, X2 = 4.5, ^ = 1-5, F - 7.2, 
and St = 0.005. The inverse temperature is varied be- 
tween /3 = 1.0 and 3.0. On this surface, we define two 
states, A and B, which encompass the area within circles 
of radius R = 1.0 centered at (—7,0) and (7,0), respec- 
tively. The external force creates two distinct pathways 
for the forward (A-^B) and backward (B-^A) transitions 
(Fig.|4f. 



Parameters associated with the WE protocol for each 
inverse temperature are given in Table [TTl and each string 
is initialized as a horizontal line connecting the centers of 
A and B. A total of iVrop replicas are initiated at each of 
the positions (—7,0), (7,0), (0,-7) and (0,7), and are 
assigned a weight of (1 — 3(5)/A^rep at (—7, 0), and S/N^cp 
otherwise, where S = 1 x lO"""^^. 

The presence of metastable intermediate states cen- 
tered at (7, 0) and (0, —7) along each path requires the 
use of the re-weighting scheme described in Section |II C| 
to efficiently converge to the correct non-equilibrium 
steady state distribution, as our choice of initial weights 
starts the system far from steady state. Projections of 
the steady state distribution for values of /3<2.5 onto 
the angular coordinate 9 are shown for both conven- 
tional (CONV) and weighted ensemble (WE) simulation 
in Fig. [5j There is excellent correspondence between the 
distributions obtained from the conventional simulations 
and the WE method across the entire temperature range 
in terms of both the density in the metastable states as 
well as the barrier regions. 

For this model system, we analyzed the convergence 
behavior of the reaction rates instead of the steady state 
distribution, following Ref. [35l Separating the ensemble 
of replicas transiting in each direction onto two strings 
using the dual-direction scheme described in Section II D| 



allowed us to calculate the reaction rate using Eq. [9 We 
then analyzed the performance of the WE method in cal- 
culating the reaction rate between states A and B. The 
WE simulations are compared against conventional sim- 
ulations in which the rate is calculated using Eq. [Sj The 
error in the calculated rate is measured as 



error(t) = | logfc(t) - logfct|, 



(16) 



where kt is the target rate constant and k{t) is the rate 
as measured at a time t after the start of the simula- 
tion. Target rate constants for /3<2.5 were calculated by 
averaging the rates obtained from ten conventional simu- 
lations, each longer than 200 times the mean first passage 
time (MFPT). For /3 — 3.0, the target rate constant was 
obtained by fitting the rates from the higher temperature 
target simulations to an Arrhenius form and extrapolat- 
ing. Errors as a function of aggregate simulation time 
are plotted for /3 — 1.5 and 2.5 in Fig. |6]and are the root 
mean squared averages of errors obtained from ten simu- 
lations. In the case of the conventional simulation, errors 
and target rates are calculated from independent simula- 
tions. The curves corresponding to the WE simulations 
are a moving average over the five previous iterations, 
and rates are based on a moving average with a window 
size of 200r. 

The performance of the WE simulations over the en- 
tire temperature range is shown in Fig. [7] by plotting 
Tx/MFPT as a function of (3. The amount of simu- 
lated time, Tx, required to obtain an error equal to X 
is calculated using Eq. [16) Measuring the error, X, on a 
logarithmic scale, Ti corresponds to the amount of time 
required to obtain an order-of-magnitude estimate of the 
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rates (error ^ 10"^), while T0.5 is the time required to get 
an estimate that is within a factor of three of the target 
(error - 10"-^ - 3). When Ti/MFPT = 1, the amount of 
time required to obtain an order-of-magnitude estimate 
of the rate is approximately equal to the amount of time 
required by a conventional simulation. 

Figure [T] indicates that the WE method is signifi- 
cantly more efhcient than the corresponding conventional 
simulation at all temperatures except for the highest 
(/3 = 1.0), in which particles on the potential can eas- 
ily cross the barriers between states. For /3 = 1.0, the 
time required to relax away from the initial distribution 
of weights (approximately a 6 function of probability in 
state A) and to the correct steady state probability dis- 
tribution is of the same order of time as the MFPT. The 
comparative efficiency of WE to conventional sampling 
increases with decreasing temperature (increasing bar- 
rier height); WE obtains an order-of-magnitude estimate 
of the rate in nearly four orders-of-magnitude less time 
than the MFPT for /? = 3.0. 



C. Elastic Network Model of the NTRC protein domain 



A and B, respectively. These single- well potentials are 
combined by exponential averaging, where the parame- 
ter controls the barrier height that separates the two 
states and thus the rate of transitions. 



1 



(Axi 



(18) 



with an analogous definition for U^{x). The distance 
between residues i and j, Axij = \xi — Xj\. Distances be- 
tween residues calculated from the reference state struc- 
tures A and B are defined as Ax^ = \xf — a;^| and 
Axfj = \xf — x^\, respectively. The contact matrices 
D^j and Df^ determine which pairs of residues are con- 
nected via harmonic linkages, are given by 



1, 



0, otherwise 



(19) 



and similarly for D^, where d'^ is a cutoff distance. The 
force constants, kij are modulated by the difference in 
pairwise residue-residue distances in A and B as 



Finally, we consider the allosteric transition between 
the inactive and active conformations of the nitrogen reg- 
ulatory protein C receiver domain (NtrC). Th is sys tem 
has been studied previou sly us ing both all-atonP^miand 
coarse-grained simulatiorP^E^. For the purpose of test- 
ing the WE string method, we choose to follow the latter 
approach, and guided by those studies, construct a two- 
state elastic network model of the protein. 

Elastic network models provide a reduced represen- 
tation of the protein, where only the Ca atom of the 
backbone is explicitly included. The interaction between 
these coarse-grained sites is governed by harmonic po- 
tentials that stabilize one or more reference conforma- 
tions, often the experimentally determined native con- 
formation. While elastic network models lack the chem- 
ical fidelity of all-atom models, they have been shown in 
some instances to recapitulate important conformational 
fluctuations of less simplistic model^^^^l^. 

Following the model construction detailed in Refs. [16] 
and UHl we build a two-state elastic network model of 
NtrC to study the transition from the inactive to ac- 
tive states of the protein upon phosphorylation. These 
states are generated from the 124 Cq positions in the 
NMR structurePJ (PDB IDs 1DC7 and 1DC8, for state 
A and B^ respectively) and are shown in Fig. |8] The 
potential energy of the protein is specified by its instan- 
taneous conformation, x £ R^^'^ , where M is the number 
of residues in the model and Xi denotes the position of 
the i**^ residue. Specifically, 

U{x) - --^In (e-/'-^'^-) + e-'3'"^"(-)) + C/«(aa7) 

where U^{x) and U^{x), defined below, are the indi- 
vidual elastic network model energies for reference states 



kij — min 



{Axt,-Axf^^■^ 



(20) 



The final term in Eq. 17 provides a hard core repulsion 



that prevents steric overlap between residues, and is given 
by 



M 



U^{x)^e J2 



a 



12 



(21) 



The parameterization of the potential is the same as in 
Ref. nil d"^'^ = 11.5 A, Efe = 0.5 kcal mor\ fc„iax = 
0.2 kcal mor^A"^, e = 1.0 kcal mor\ a = 1.0 A, 13^ = 
0.02 kcal mol~^, and the masses of all particles, m = 
100 amu. As in Ref. [181 we used a modified value of 
13m that differs from the original model given in Ref. 1161 
where was set to 0.005 kcal moP^. 

We then introduce a set of collective coordinates 
6{x) = Rx + X, which we will use to assign each confor- 
mation to an image along the string. The coordinate 9 is 
of the same dimensionality as our original system, but re- 
moves the degeneracies due to translation and rotation of 
the system. In the above definition, R is the rotation ma- 
trix and X is the translation vector that when applied to 
X minimizes \d{x) — a^icf]. The distance metric in the col- 
lective coordinate space is the root mean square deviation 
(RMSD) of the optimally aligned conformation 9{x) with 
some reference coordinates of the protein, x^-d. Here, the 
RMSD is calculated using the fast quaternion-based char- 
acteristic polynomial methocP^, which does not require 
the explicit calculation of R. The two residues at both 
the C- and N-Termni of the protein are highly mobile 
and are excluded from the RMSD calculation. 
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The string is initiated as the hnear interpolation be- 
tween A and B using 40 images {Nim = 40). Forty reph- 
cas {Nrep — 40) were initiated in the bins correspond- 
ing to A and B with each repUca given equal weight. 
The string was free to move during the first fOGO r of 
the simulation and then was fixed at its converged po- 
sition to collect statistics for the path. An equilibrium 
re- weighting procedure was applied during the first f 500 
T of the simulation every 10 r. An additional 3000 r 
steps of the WE method were simulated with the con- 
verged string to collect statistics. The total simulation 
time was 4500 t. 

The free energy associated with the path defined by 
the converged string is shown in Fig. [9j The free en- 
ergy in each Voronoi bin, Gq , is calculated from the WE 
simulation directly as Ga — —ksTln (Wa), where Wa is 
the average weight assigned to bin a. Statistical errors 
for averages of the timeseries data used to calculate Wa 
were estimated by autocorrelation analysi^^ using the 
timeseries module of the pyMBAR codeP. The free 
energy along the string is shown in Fig. |9] and agrees 
with the barrier height and relative stability between A 
and B to within 1 kcal/mol of the results calculated from 
the FTS method (Figure 11 in Ref-HH]). 

While we extended the simulation 3000 r beyond the 
phase when re-weighting was applied, an accurate es- 
timate of the free energy difference between A and B 
(within the 95 % confidence interval) was obtained dur- 
ing the first 500 r of the simulation. During this same pe- 
riod the barrier height differed from the converged value 
by less than 1.3 kcal/mol. 

While the free energy shows a distinct barrier along 
the path separating the inactive and active states, it is 
important to determine whether the converged path is 
dynamically relevant, and if the barrier corresponds to 
a mechanistic transition state. To this end, we calculate 
the committor probability^ for each bin, a, along 
the string, 5+, defined as the probability that fleeting 
trajectories initiated from that bin reach state B before 
state A. If the string accurately describes the dynamical 
reaction pathway, (7+ — 1/2 should coincide with the 
barrier with neighboring bins rapidly asymptoting to zero 
and unity on either side of the barrier. 

For each bin, we select 500 random conformations 
drawn from the final 2500 r iterations of the WE simu- 
lation. A hundred conventional simulations are initiated 
from each conformation with initial velocities drawn ran- 
domly from a Boltzmann distribution at the same tem- 
perature as the WE simulations. These simulations are 
propagated until the trajectory reaches state A or B, 
and the terminating state is recorded. A trajectory ter- 
minates in state A (B) when its RMSD is < 2 A from 
the reference conformation of A {B) and > 3 A from B 
(A). 

Alternatively, the committor probability along the 
string can also be calculated directly from transition 
statistics gathered during the WE simulations by solv- 



ing the following system of equation^^ 

-it+Y. ^^fc'^fe =-J2 '^^^ fo^' (22) 

fee/ k£B 

where qf is the probability that from bin i the system will 
reach state B before returning to state A, and Tij is the 
probability of going from bin i to j, which is calculated 
directly from the WE simulation. Intermediate states, /, 
are those states which are not in A or B. 

Committor probabilities calculated from both methods 
are shown in Fig. [lO] and are nearly identical over all of 
the bins. The q+ — 1/2 bins coincide with the peak of the 
barrier in Fig. [9] between a = 18 and 19. Representative 
conformations from these bins are shown in the center 
panel of Fig. |8] Taken together with the observation 
that the underlying distributions for each bin, P(q^), 
are unimodal, the converged string appears to describe 
the reactive pathway between the inactive and actives 
states for this model of NtrC. 



IV. DISCUSSION 

We have introduced a variant of the WE method to 
sample transitions along a one-dimensional path embed- 
ded in the space of an arbitrary number of order param- 
eters. As in the finite-temperature string method-^- on 
which our method is based, an equidistantly spaced set 
of images along the path define a set of Voronoi cells 
that are analogous to the subdivision of phase space into 
bins in earlier versions of the WE method. The images 
adaptively evolve toward the average weighted position 
of replicas that migrate through each bin during the WE 
simulation, subject to a smoothing reparameterization. 
The use of a dynamic set of bins that change in time dur- 
ing a WE simulation had been previously suggestecPHESl^ 
and here we demonstrate its practical use to confine sam- 
pling to the region of order parameter space along the 
transition pathway. 

Unlike in variants of the string method where the 
string images are used to either initiate replicas of the 
systerrP^l2il, or to perform constrained sampling along a 
hyperplane coincident with strinjl^, here the path only 
serves to partition phase space. Its utility when com- 
bined with the adaptive update scheme is that it allows 
the WE method to enhance sampling along the transition 
tube between states of interest. In doing so, the string 
and associated Voronoi bins do not perturb the natural 
dynamics of the system. 

The path of the string does, however, play an im- 
portant role in determining the efficiency of the WE 
method. Partitioning progress coordinate space along 
a pseudo one-dimensional curve converts the computa- 
tion from one where the cost scales exponentially with 
the number of order parameters to one which is linear 
in the number of images along the string. When a large 
number of progress coordinates are required to effectively 



9 



sample a complex system, this change in the scaling be- 
havior is likely to drastically reduce the computational 
cost of performing a WE simulation. Additionally, so 
long as the string accurately approximates the dominant 
path of reactive flux, the majority of transitions along the 
string will occur between neighboring Voronoi bins. The 
reduced number of relevant bin interfaces across which 
transitions occur should decrease statistical errors in the 
rates estimates used in Eq. [7] This likely increases the 
efficiency of the re- weighting procedure in Sec. |II Cj com- 
pared to its use with other binning regimes previously 
used with WE sampling. 

The effectiveness of the string method as a way of par- 
titioning phase space is subject to the assumption that 
the width of the transition tube about the string is small 
compared to its radius of curvaturd^. This assumption 
is a well-documented limitation of the string method that 
precludes its use for systems with multiple reactive chan- 
nels or many important metastable states connected via 
a meshwork of isolated transition pathways. The use of 
multiple strings to sample each pathway (using a mod- 
ified version of the dual-direction scheme in Sec. IIIBl, 



or pairwise between well-defined states could circumvent 
this limitation, although accurately positing the presence 
and location of multiple pathways a priori for complex 
systems in almost all cases is non-trivial. 

In selecting the two driven 2D systems in sections [ill A| 
and IIIB we sought to provide a direct comparison be- 
tween our meth od an d the NEUS-based string method 
of Dickson, et aP^EH, We carefully attempted to repli- 
cate the implementation of the underlying Brownian dy- 
namics, as well as their convergence analysis to test the 
efficiency of the WE method. Since the conventional dy- 
namics in this study and the NEUS papers agree within 
statistical variation, the results presented in Figs. 3[6 and 
[7] should be directly comparable. For both example sys- 
tems, WE performs admirably and appears to show sim- 
ilar or better convergence characteristics than NEUS. It 
is important to note, however, that in this comparison, 
the WE simulations use a higher density of replicas than 
in the NEUS studies. In NEUS, a single replica is simu- 
lated per bin; in the WE simulations many replicas are 
run per bin, although each replica is run for a significantly 
shorter time than the replicas in the NEUS simulations 
per iteration. We generally observe a super-linear gain 
in efficiency over some range of increasing the number of 
replicas and/or bins in the system (e.g. increasing the 
number of bins by a factor of 2 speeds convergence by 
> 2 times). Increasing the number of bins likely reduces 
the discretization errors associated with estimating the 
rates used in Eq^^ during the re-weighting step early 
in the simulation^ , and increasing the number of repli- 
cas reduces the variance in the inter-bin transition rates. 
Using multiple replicas per sampling region, which was 
suggested in a recent application of the NEUS to the 
nonequilibirum folding and unfolding of coarse-grained 
model of RNA'^, in concert with a finer discretization of 
phase space, may ameliorate the performance differences 



between the two methods. 

Finally, we have presented a new string method based 
on sampling obtained via the Weighted Ensemble method 
that appears to have many advantages over previous 
methods. WE sampling is easy to implement both in 
terms of the data structures needed to track the repli- 
cas of the system and also by not requiring any special 
modification of the underlying dynamics to restrain repli- 
cas to a particular region of phase s pace, ei ther through 
momentum reversal at a boundaryiSIlMSl qj- soft-wall 
restraint&5S^. Additionally, it is not necessary to generate 
physical replicas along the initial string from the start 
state to the final state as it is in most other string meth- 
ods. Often the initial string is generated using a sim- 
ple linear interpolation, targeted molecular dynamicJ^^ 
or with a coarse-grained modeP^, all of which can lead 
to string images corresponding to unphysical intermedi- 
ates for many systems of interest. With the WE-based 
method, all replicas can be initiated in the start state 
and the string will evolve based on the natural dynam- 
ics of the system, without ever starting replicas based 
on these unphysical conformations. In this regard, our 
method shares some similarities with the growing string 
methocP^. Despite this, some care should be taken when 
generating the initial path of the string. The calculation 
could still become trapped if the initial path is separated 
from the dominant pathway by a large barrier orthog- 
onal to the string. Lastly, as we have shown here, the 
WE-based string method can be drastically more effi- 
cient than conventional sampling for calculating the rates 
and steady-state distributions for a range of equilibrium 
and nonequilibrium problems. Our method also outper- 
forms the NEUS rare-event sampling method for two of 
the example systems studied here, but this result may 
depend on the system being simulated or it may be pos- 
sible to tune the NEUS parameters to increase perfor- 
mance. This work lays the foundation for applying the 
WE-based string method to simulating rare transitions in 
more complex and realistic systems, especially when one 
or a small number of progress coordinates are insufficient 
to fully characterize the reaction coordinate. 
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FIG. 1. The two-dimension periodic potential surface for 
a = 1.125, with the converged path of 20 centers (black dots) 
and corresponding Voronoi cells. Contour lines are separated 
by kT, and the corresponding color scale is shown in the same 
units. In each Voronoi cell, a random sample of the instanta- 
neous positions visited by the replicas is shown in alternating 
light and dark gray dots for clarity. 
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FIG. 2. Projections of the steady-state distributions for the 
two-dimensional periodic potential onto the y axis. Solid lines 
show the distributions for the common and rare parameter 
sets obtained using conventional sampling. 
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FIG. 3. Convergence of the steady-state distributions pro- 
jected onto the y axis for both conventional and WE sim- 
ulations for the two-dimensional periodic potential where 
a = 1.125 (top) and a — 2.25 (bottom). For both the con- 
ventional and WE simulations, the errors are averages over 
individual error curves calculated for ten independent simu- 
lations using Eqs. (131 and (14 1. The distributions for the 



WE simulations do not include statistics from the first 50r. 
The target distribution for the shallow potential (top) is cal- 
culated from a single long conventional simulation 4.0 x 10^ 
steps in length. For the deep potential (bottom), the target 
distribution was obtained by averaging ten conventional sim- 
ulations of 2.0 X 10^" steps. In each case, separate simulations 
were used to calculate the error and the target distributions. 
The solid circle marks the average time at which all histogram 
windows were populated for the ten simulations. 
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FIG. 4. The two-dimensional ring potential with two pathways, (left panel) An instantaneous snapshot of all of the active 
replicas during a representative iteration, where replicas that have last visited A are shown in red, and those that last visited 
B shown in blue. The circles, centered at (-3,0) and (3,0), show the boundaries of the A and B states, (center and right 
panels). The center and right panels show the converged strings with the corresponding Voronoi cells for the A^B, and B~^A 
transitions, respectively. The circles delineate A and B as in the left panel. Contour lines are separated by kT, and the 
corresponding color scale is show in the same units. 
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FIG. 5. Projections of the steady-state distribution for the 
two-dimensional ring potential onto 9 for /3=1.0,1.5,2.0 and 
2.5, obtained using conventional (lines) and weighted ensem- 
ble (circles) simulations. The probability of finding a particle 
in either of the two metastable intermediates a.t 9 = —it/2 or 
7r/2 decreases with increasing /3. 
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FIG. 6. Convergence of the A— >_B rate constant for the two- 
dimensional ring potential where /3 = 1.5 (top) and 2.5 (bot- 
tom). For both conventional and WE simulations, the errors 
are calculated using Eq. |16[ and are the averages of the error 
curves of ten independent simulations. For the conventional 
simulations, an estimate of the error cannot be obtained until 
the first transition from A to i? is observed. 
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FIG. 7. Efficiency of the WE method for the two-dimensional 
ring potential as a function of inverse temperature, /3. Effi- 
ciency is defined as the ratio of the total aggregate simulation 
time, Tx, to the system's natural MFPT: Tx/MFPT, where 
Tx is the time required to achieve a desired error, X. Thus, 
when the efficiency is 1 the total simulation time is the time 
required for the MFPT, which may be extremely long. We 
carried out this analysis for an order of magnitude error esti- 
mate (10^) of the rate Ti and a factor of three error estimate 
(10°-^ ~ 3) of the rate Tq.b 



16 




FIG. 8. The inactive (left) and active (right) conformations of the NtrC^ receiver domain, generated from PDB IDs 1DC7 and 
1DC8, respectively, (center) Ten conformations taken from the Voronoi bin corresponding most closely to the — 0.5 (bin 
index 18). Each conformation is the average of 5 randomly selected snapshots. Coloring is based on residue indices starting 
from blue at the N-terminus and ending with red at the C-terminus. 
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FIG. 9. Free energy Ga associated with each Voronoi tessela- 
tion along the string for the elastic network model of NtrC^. 
The dark shaded region denotes the 95% condence interval 
(two standard errors) for the free energy averaged over the 
last 3000 r of the WE simulation. 




FIG. 10. Committor probability along the string for the elas- 
tic network model of NtrC^. The average committor proba- 
bility of reaching state B before state A, q"*", calculated from 
an ensemble of 500 conformations in each bin is shown as a 
solid line, with error bars equal to the standard deviation of 
the sample. Committors calculated from the WE simulation 
using Eq. [22]are shown as open circles. 
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TABLE 1. Parameters used in the WE simulations for the TABLE II. Parameters used in the WE simulations for the 
periodic two-dimensional potential. two-dimensional ring potential. 
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